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abstract 


This  report  summarizes  the  discussion  of  a  workshop  on  the  Architec¬ 
ture  and  Application  of  Digital  Modules  that  was  held  on  June  7-8,  1973 
at  Carnegie-Mellon  University.  The  purpose  of  the  workshop  was  to  identify 
the  major  influences  that  continuing  advancements  in  semiconductor  tech¬ 
nology  will  have  on  the  next  generation  of  digital  systems.  The  workshop, 
and  this  report s  can  be  approximately  partitioned  into  three  main  topics: 
discussion  of  current  register-transfer  level  module  sets  and  what  can  be 
learned  from  their  development  and  use;  the  state  of  semiconductor  tech¬ 
nology  and  its  current  trends;  and  finally,  discussion  of  current  efforts 
to  define  or  build  computer  structures  that  may  become  prototypes  of  the 
next  generation  of  digital  systems. 


1 .  INTRODUCTION 


Modules  for  computer  system  design  are  becoming  increasingly  complex, 
driven  by  decreasing  cost  and  size  of  hardware  and  increasing  computer  sys¬ 
tem  performance  requirements.  Standard  modules  have  evolved  from  circuit 
elements  to  gates  and  flip-flops  to  integrated-circuit  chips  to  register- 
transfer  level  module  sets.  Because  of  the  continuing  development  of  semi¬ 
conductor  technology,  LSI  components  (e.g.,  memory  chips  with  £  IK.  bits  and 
microprocessors)  may  become  the  standard  components  of  digit*’-!  design.  Are 
these  memory  arrays  and  microprocessors  the  right  set  of  large  modules  to 
use  in  the  next  generation  of  digital  system  design?  To  discuss  thic  and 
related  questions,  a  workshop  on  the  Architecture  and  Application  of  Digital 
Modules  was  held  on  Jane  7-8,  1973, at  Carnegie-Mellon  University.  To  ensure 
as  wide  a  range  of  perspectives  as  possible,  participants  were  invited  from 
computer  manufacturers,  semiconductor  manufacturers,  and  universities.  (See 
the  appendix  for  the  list  of  participants.) 

The  workshop,  and  this  report,  can  be  approximately  partitioned  into 
three  main  topics:  discussion  of  current  register-transfer  level  module 
sets  and  what  can  be  learned  from  their  development  and  use;  the  state  of 
semiconductor  technology  and  its  current  trends;  and  finally,  discussion  of 
efforts  to  define  or  build  computer  structures  that  may  become  prototypes 
of  the  next  generation  of  digital  systems.  The  final  section  of  this  re¬ 
port  attempts  to  summarize  the  major  observations  of  the  workshop.  While 
these  observations  lack  a  degree  of  quantitative  precision  that  might  e  de¬ 
sired,  they  are  general,  qualitative  statements  that  withstood  the  some- 

fr 

times  heated  debate  of  the  workshop.  The  major  purpose  of  this  report  is 

k 

While  the  authors  cannot  accept  credit  for  all  the  observations  reported 
here,  we  do  bear  responsibility  for  any  errors  or  distortions  that  may  be 
present. 
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to  make  these  observations  available  to  a  larger  group  than  just  the  work¬ 
shop  participants,  and  to  hopefully  stimulate  further  investigation  now  that 
these  statements  are  in  black  anti  white  rather  than  merely  circulating  as 
folklore  at  informal  workshops. 

2.  EXISTING  REGISTER -TRANSFER  LEVEL  MODULE  SETS 

Several  register  transfer  level  modular  systems  have  been  developed  in 
the  last  six  years.  By  a  modular  system  we  mean  a  small  set  of  modules  that 
adhere  to  some  intermodule  communication  protocol  and  are  interconnected  using 
a  small  set  of  rules  to  produce  a  system  which  performs  the  desired  algorithm. 
Typically  these  systems  are  divided  into  a  control  part  and  a  data  part.  The 
first  such  module  set  was  the  macromodules  developed  at  Washington  University 
in  1967  [Clark,  et  al.,  1967], 

Macromodules  consist  of  si  set  of  data  and  control  modules  that  can  be 
stacked  together  which  defines  implicit  data  and  control  interconnections  be¬ 
tween  adjacent  modules.  Arbitrary  pathways  can  be  established  by  interconnect¬ 
ing  modules  with  data  and/or  control  cables.  Due  to  the  existence  of  several 
buses  (or  data  paths)  in  a  macromodule  system  a  high  degree  of  concurrency  is 
available.  The  major  goal  of  the  r.acromodule  project  is  to  provide  a  set  of 
easily  used  modules  (as  typified  by  the  number  of  modules,  data  cables,  and 
control  sequences)  that  can  also  handle  indefinite  expandability  (such  as 
variable  word  length). 

In  1971  a  set  of  Register  Transfer  Modules  (RTM's)  Decame  available 
from  Digital  Equipment  Corporation  (DEC)  [Bell,  et  al.,  1972].  RTM's  were 
designed  by  DEC,  whose  primary  goal  was  to  look  for  a  means  of  incorporating 
MSI  in  their  line  of  module  boards,  and  by  Carnegie-Mellon  University,  whose 
primary  interest  was  the  teaching  of  systematic  logic  design.  Like  macro¬ 
modules,  RTM's  use  a  distributed  control  scheme  ^currently  there  are 


approximately  half  a  dozen  control  module  types).  As  an  economic  decision, 
all  the  data  modules  (approximately  a  dozen  data  module  types)  were  inter¬ 
connected  via  a  single  bus.  However,  provision  exists  for  RIM  systems  to 
have  more  than  one  data  bus  when  increased  performance  is  required. 

Three  other  RT  level  modular  systems  were  discussed  at  the  workshop. 

One  is  a  system  developed  at  the  University  of  Washington  which  is  similar 
in  concept  to  RTM's.  However,  a  microprogrammed  controller  is  used  for 
the  control  part  (approximately  75  chips  with  100-200  nsec  to  execute  a 
control  step  depending  on  the  nature  of  the  step).  Data  modules  are  devel¬ 
oped  as  the  need  arises  by  specifying  a  module  to  a  computer  aided  design 
package  which  then  generates  a  wiring  list.  The  major  goal  of  this  effort 
is  to  provide  support  for  medical  experiments  at  the  University  of  Washington. 

A  set  of  asynchronous,  distributed  control  modules  is  also  being  devel¬ 
oped  by  MIT  [ Patil  and  Dennis,  1972].  Another  effort  at  the  University  of 
Delaware  has  generalized  the  RTM  control  modules  into  a  single  universal  control 
module  (two  of  which  can  fit  in  a  14  pin  dual  in-line  package)  LRobinson,  1973], 
Data  parts  are  simply  constructed  from  standard  MSI  chips  in  the  University  of 
Delaware  system. 

One  of  the  major  goals  of  all  these  projects  is  to  teach  systematic  design 
of  control  logic.  Semiconductor  manufacturers  currently  offer  a  comprehen¬ 
sive  set  of  data-part  packages  (registers,  shift  register,  ALU's)  while  offer¬ 
ing  a  bewildering  a^ray  of  SSI  packages  to  perform  control  functions  \RS,  JK , 
Trigger  flip-flops,  etc.).  By  integrating  these  control  modules  into  conveni¬ 
ent  and  economic  packages  the  semiconductor  manufacturers  could  help  reduce 
the  pitfalls  of  conventional  control  logic  design.  liven  if  the  control  modules 
are  not  made  available  as  chips,  designers  can  still  use  the  techniques  typified 
by  distributed,  asynchronous  control  to  reduce  design  and  debugging  time. 
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Some  of  the  most  interesting  discussions  at  the  workshop  included  compari¬ 
sons  of  the  cost  ,  performance  and  design  time  of  the  two  complete  RT  level 
modular  systems  versus  standard  SSl/MSI  designs. 

First,  with  respect  to  cost,  macromodules  and  RTM's  seem  more  expensive 
when  compared  to  standard  logic  design.  However,  they  owe  a  substantial  por¬ 
tion  of  their  cost  specifically  to  those  features  which  make  them  modular  sys¬ 
tems  (to  establish  module  protocol,  to  allow  word  extendability,  etc.).  It 
was  estimated  that  this  cost  was  50^-70$  of  the  total  cost  of  macromodules  and 
30^  of  RTM's.  A  system  built  with  macromodules  might  cost  between  2  and  10 
times  that  of  a  comparable  system  built  for  the  same  task  in  SSI  and  MSI  com¬ 
ponents. 

However,  this  extra  cost  is  the  payment  necessary  to  achieve  the  design 
goals  of  flexibility,  very  short  design  time,  and  expan  bility.  The  advan¬ 
tages  of  short  design  and  debugging  time  in  a  one-of-a-kind,  quick  turnaround, 
experimental  environment  are  obvious.  It  was  stressed  for  both  macromodules  and 
RTM's  that  the  translation  of  an  algorithm  from  paper  design  to  hardware,  dis¬ 
regarding  wiring  errors,  always  produced  a  system  that  operated  as  specified. 

DEC  h3s  used  RTM's  as  a  breadboarding  technique  to  debug  new  aoproaches 
as  well  as  produce  low  volume,  custom  systems  where  engineering  design 
time  is  a  major  portion  of  the  product  cost.  Presently,  DEC  has  marketed  over 
300  custom  system^  that  have  been  designed  and  built  with  RTM's.  A  typical 
system  consisted  of  50-100  steps,  i.e.,  control  modules;  the  largest  system 
built  consisted  of  a  little  over  500  steps.  Most  RTM  systems  of  more  than  100 
steps  use  a  ROM  control  unit  rather  than  separate  control  mod’  les  for  each  step. 

To  date,  maciomodules  have  been  used  extensively  in  a  hybrid  fashion: 
coupled  to  a  computer,  they  perform  the  small  portion  of  the  calculation  which 
consumes  most  of  the  time.  Comparison  of  performance  between  design  with  RT 
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module  sets  and  convent tonal  logic  is  best  seen  by  a  number  of  examples: 

1.  At  CarnegLe-Mellon  University,  a  PDP-8  hat  been  L-uilt  wi  ch  Kill’  s 
in  55  control  step'  for  double  the  cost  and  only  40^  of  the  speed 

of  a  real  PD?-o.  The  point  o'  this  PDP-b  example  is  that  tae  major 
area  for  Rib's  is  cur  tom  design,  not  genera^  purpo-e  computing*  It 
is  difficult  to  envision  a  modular  architecture  wmcn  could  offset 
the  factor  of  in  speed  and  cost.) 

2.  Matrix  multiply  programmed  on  a  small  machine  took  400  p,sec,  on 
a  CDC  7600  5  p,sec  and  in  macromodules  35  p,sec. 

3.  The  FFT  (Fast  Fourier  Transform)  butterfly  multiply  performed  in 
macromodules  was  comparable  in  execution  time  to  one  programmed 
on  the  CDC  6600. 

4.  The  major  path  of  an  electrocardiogram  preprocessor  took  from 

7  y,sec  (CDC  6600)  to  37  y,sec  (PDP-9)  when  programmed  in  assembly 
language  on  a  general  purpose  computer.  A  macromodule  system 
took  3  |j,sec  and  a  special  purpose  TTL  design  a  projected  lj  g,sec. 

The  last  two  examples  illustrate  that  (1)  RTM's  and  macromodules  coi.pete 
successfully  with  general  purpose  computer  wiien  used  in  ome  .iig.  speed  ap¬ 
plications,  since  hardwired  implementations  of  the  algorithms  do  not  incur 
the  overhead  of  instruction  fetch  and  decode,  and  (2)  the  modular  systems 
can  exploit  the  parallelism  in  the  algorithm  that  a  standard  j.ngle- 
instruction-stream  single-data-stream  computer  cannot. 
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3.  SEMICONDUCTOR  TECHNOLOGY 

Several  microprocessor  chips  (or  small  sets  of  chips)  were  described  by 
the  representatives  from  the  semiconductor  manufacturers:  specifically, 

Intel's  MCS-4  (4  bits/word)  and  MCS-8  (8  bits/word),  National's  16  bits/word, 
and  American  Microsystems'  (AMI)  16  bits/word  processors.  For  discussion  of 
these  microprocessors  see  [Intel,  1372  A, 5:  National,  1972], 

Two  other  microprocessors  were  discussed  that  are  currently  in  various 
stages  oi  development :  Intel's  8080  and  SMS's  bipolar  microprocessor.  The 
Intel  8080  is  an  8-bit  MOS  processor  in  a  40  pin  package,  16  of  which  are 
aidless  lines.  It  has  7  8-bif  registers  and  maintains  a  stack  in  memory. 
Scientific  Micro  Systems'  (SMS)  is  exploring  the  feasibility  of  a  small  (800- 
1000  gates)  bipolar  microprocessor  processor  with  a  250  nanosecond  cycle  time, 
as  compared  to  the  MOS  cycle  time  of  about  1  microsecond.  The  objective  is  to 
initially  design  fer  speed  and  /rads  it  fc*r  ether  capabilities  later. 

Several  future  trends  are  apparent  in  the  semiconductor  industry: 

1.  Since  about  I960  the  commercially  feasible  chip  complexity  (i.e., 
numter  of  devices  per  chip)  has  roughly  doubled  every  one  to  two  years. 
In  regular  logic  the  4l<  bit  RAM  P '‘,000  devices)  was  introduced  roughly 
2~  years  after  the  IK  bit  (4000  devices)  RAM.  The  doubling  effect  also 
holds  for  random  logic.  The  4  hit/ word  Intel  MCS-4  microprocessor  has 
—  2300  devices.  The  Intel  8080  will  be  introduced  aVout  two  years 
after  the  MCS-4  and  will  contain  —  4500  devices. 

2.  The  regular  p-ttern  chips  (e.g.,  memories)  have  about  four  times 
the  density  of  random  logic  chips  (e.g.,  processors)  for  the  same 
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manufacturing  complexity.  For  example,  the  Intel  MCS-4  (4  bits/ 
word)  processor  is  about  as  difficult  to  produce  as  1K-2K  RAM  or 
~  4000-8000  devices.  The  Intel  8080  (8  bits/word)  is  on  the  order 
of  complexity  of  a  4K  RAM  or  —  13,000  devices.  If  this  relation 
continues  to  hold  in  coming  years,  we  can  expect  to  see  microproces¬ 
sors  equivalent  in  complexity  and  cost  to  **  500  memory  words  (of  the 
same  size  as  the  processor’s  data  path),  which  is  less  than  we  might 
predict  based  on  current  minicomputer  systems  (i.e.,  4K  to  32K  words). 

3.  The  chip  complexity  achievable  in  bipolar  technology  usually  lags  MOS 
technology  by  two  years.  Hence  MOS  memories  tend  to  be  four  times  the 
size  of  bipolar  memories.  The  largest  MOS  RAM  currently  available  is 
4K  while  for  bipolar  RAM's  it  is  IK.  Only  in  the  area  of  ROM’s  is 
bipolar  density  comparable  to  MOS.  Since  the  increase  in  density  of 
bipolar  technology  tracks  that  of  MOS,  the  present  100-200  chip  bi¬ 
polar  minicomputers  can  be  expected  to  decrease  by  a  factor  of  two  in 
chip  count  per  year  provided  the  semiconductor  manufacturers  can  pro¬ 
vide  the  proper  chips. 

4.  MOS  technology  is  approaching  bipolar  speeds.  Currently  n-channel 
speeds  are  comparable  to  TTL.  The  major  constraint  on  speed  is  heat 
dissipation,  which  is  limited  to  less  than  one  watt/ package  for  air 
cooling. 

5.  Production  of  a?  LSI  chip,  as  typified  by  a  microprocessor,  is  not 
a  small  undertaking.  Once  the  architecture  i".  specified,  it  is 
5-10  man-years  be  Tore  the  component  is  ready  to  go  into  production. 


1  in-  -- : 
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Detailed  logic  design,  simulation,  layout,  initial  runs,  and  de¬ 
bugging  consume  most  of  the  time.  Largely  because  of  this  long 
and  costly  development  tire,  semiconductor  companies  look  for  com¬ 
ponents  with  a  large  volume  market.  For  example,  in  1972  approxi¬ 
mately  two  million  IK  bit  MOS  RAM  memories  were  sold.  Now  if  we 
contrast  this  with  the  present  minicomputer  market,  which  is  on  the 
order  of  30,000  units/year,  it  is  not  difficult  to  understand  why 
the  semiconductor  manutacturers  are  reluctant  to  develop  a  mini¬ 
computer  on  a  chip.  The  microprocessors  that  have  been  designed 
are  for  mass  markets  such  as  personal  calculators,  terminals  and 
controllers.  The  popularity  of  4  and  8  bits/word  microprocessors 
is  largely  the  result  of  the  calculator  and  terminal  markets, 
respectively. 

It  is  interesting  to  note  that  several  techniques  that  have  been  used 
in  the  architecture  of  Irrge  computers  are  being  employed  or  seriously  con¬ 
sidered  for  use  in  microprocessors.  Pipelining  and  microprogramming  are  a  few 
examples.  Also,  since  line  capacitance  off-chip  to  on-chin  may  be  as  large  as 
10:1  (with  subsequent  decrease  in  speed  and  increase  in  driver  capacity) , on- 
chip  memory  in  the  form  of  a  cache,  or  some  other  form  of  high-speed  scratch¬ 
pad,  looks  attractive. 


i-ra  irtiiiifc* 
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4.  PMS*  LEVEL  MODULES 

Given  the  technological  trends  outlined  in  the  previous  section,  now 
ve  capitalize  on  them  in  the  design  of  future  computer  structures?  The 
emergence  of  the  microprocessors  just  discussed  suggests  that  an  obvious 
‘ large”  control  module  would  be  a  microprocessor.  Although  there  has  been 
considerable  discussion  of  multiple  processor  systems  in  the  past,  there 
has  not  beer  the  widespread  «  pplication  of  multiple  r icro- ,  mini-,  cr  macro-pro¬ 
cessors  systems  to  give  us  a  s*  lid  foundation  from  wric.  to  judge  microprocessors 
as  basic  modules  of  design.  Tbn  potential  for  high  reliability,  increment¬ 
al  expandability , and  very  high  throughput  is  clear;  the  problem  centers 
around  how  to  interconnect  r!«e  microrroces ~ors  economically  nd  program 
them  lO  coor crate  effectively.  Although  we  have  no  easy  answer  tc  the  above 
problems,  th ;  workshop  dia  isoLate  and  discuss  the  following  efforts  in 
multiprocessor/ mult’ computer  design  as  potential  prototypes  of  systems  built 
from  "IMS  modules”,  i.e.,  LSI  microprocessors  and  memories. 

4.1,  iter  Networks 

One  possible  prototype  i  •‘he  computer  ne^Tork  as  exemplified  by  several 
loop  systems  and  the  AkFA  network  [Fierce,  1572  ;  Farber  and  Larso  1972; 

Roberts  and  Wessler,  1970],  The  links  between  computers  are  fixed  and  me  - 
su^cc  are  passed  via  store  and  forward  schemes.  uaza  is  sent  serially  at 
rates  of  100  to  2000  KHz;  lesponse  time  is  on  the  order  of  100  to  1000  r.illi- 
econds.  Tnese  performance  measures  indicate  present  computer  networks  are  too 
^loosely  coupled5  to  be  considered  as  prototypes  of  high  performance  computer 
structures  built  from  FMS  modules. 

T0r 

Processor -Memory-Switch.  For  a  general  description  ot  the  of  this 

level  of  description  to  computer  structure  to  other  levels,  such  as  t.?gv  ,ter 
transfer,  see  [Bell  and  Newell,  1971], 
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4.2.  C.mmp,  A  Multi-Mini-Processor 

C.nunp  xs  a  multiprocessor  computer  system  currently  under  construction 
at  Carnegie-Mellou  University  [Wulf  and  Bell,  197?].  It  consists  of  up  to 
16  processors  (modified  PDP-11' s)  communicating  through  a  central  cross- 
*'oint  switch  to  16  memory  modules.  See  Figure  4.1  for  an  overview  of  the 
structure  of  C.mmp.* 

Three  aspects  of  the  C.mmp  project  are  particularly  relevant  to  this 
discussion.  First,  C.mmp  achieves  a  much  "tighter  coupling"  among  its  pro¬ 
cessors  than  computer  networks  because  it  can  effectively  pass  a  data  struc¬ 
ture  between  processors  by  passing  a  pointer  to  the  data  via  an  interproces- 
tor  interrupt.  Estimates  indicate  it  will  tak';  at  least  300  p,sec  for  jobs  to  com¬ 
municate  via  the  interprocessor  interrupt  because  of  the  need  to  do  a  con- 
sw«.;p  at  the  target  processor. 

Second,  C.mmp  is  a  standard  multiprocessor  system  in  the  seme  that 
all  the  processors  share  the  same  physical  address  space.  The  time  to  access 
addressable  data  is  independ  .nt  of  where  it  resides  in  physical  memory.  How¬ 
ever,  cache  memories  have  been  proposed  to  exploit  the  "locality"  of 
programs  and  hence  increase  the  performance  of  the  system.  The  cache  memories 
would  hold  read-only  segments  for  th»  processors.  A  hit  in  the  cache  would 
el imlnate  the  need  for  a  processor  to  send  a  request  through  the  crosspoint 
switch  to  access  an  operand  in  memory.  This  saving  could  be  significant 
since  the  delays  through  the  switch  are  about  the  same  as  the  access  time 
of  the  memory  (250  ns). 

Finally,  the  address  space  of  a  standard  PDP-11  (and  other  16-bit  mini- 
co, .peters)  is  only  C4K  bytes,  y_L  the  need  was  immediately  felt  for  an  ad¬ 
dress  space  in  C.mmp  on  the  order  of  2M  (million)  bytes.  A  set  of  relocation 

registers  are  used  in  C.mmp  to  map  the  smaller  Address  space  of  a  processor 

*we  use  the  notation  of  Bell  and  Newell  [1971]  in  this  paper  to  describe  ike 
structure  of  computer  systems. 
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in  to  the  larger  physical  address  space  of  the  system.  The  exploitation  of 
process  locality  and  the  requirement  of  a  larger  physical  address  space  than 
any  of  the  individual  processor's  virtual  address  space  are  common  themes  we 
will  see  again  in  the  other  two  systems  di^nussed  in  this  report. 

4.3.  HSM  IMP:  Bolt,  Beranek  and  Newman's  Mul viprocessor  IMP 

BBN  is  designing  a  hi£  ly  reliable  and  modular  multiprocessor  to  replace 
the  Interface  Message  Processors  (IMP's)  at  certain  ARPA  network  nodes.  The 
task  is  special  purpose  and  the  cost  is  expected  to  be  $100,000  for  a  14  pro¬ 
cessor  system  [Heart,  et  ai.,  1973].  The  structure  of  the  HSM  (High  Speed 
Modular)  IMP  is  shown  in  Figure  4.2. 

One  of  the  main  differences  between  the  HSM  IMP  and  C.i.imp  is  that  the  HSM 
IMP  has  no  centralized  crosspoint  switch.  The  initial  design  has  two  memory 
buses  (each  housing  part  of  the  shared  memory)  and  seven  processor  buses  (each 
with  up  to  four  processors  and  a  small  amount  of  local  memory).  Processor 
buses  are  connected  to  memory  buses  through  bus  couplers  that  map  addresses 
that  are  not  references  to  local  memory  from  processor  buses  to  memory  buse  . 

As  in  C.mmp,  a  relocation  -  or  address  mapping  -  unit  is  used  to  translate  the 
smaller  virtual  address  space  of  the  16-bit  processor  (a  Lockheed  SUE  processor 
in  this  case)  into  the  larger  physical  address  space  of  the  system. 

Any  processor  bus  can  be  connected  to  any  number  of  memory  buses  and  any 
memory  bus  can  be  connected  to  any  number  of  p.  cessor  buses.  Memory  and  pro¬ 
cessor  buses  can  also  be  connected  to  an  i/o  bus.  Hence  the  bus  couplers  con¬ 
stitute  a  distributed  crosspoint  switch;  each  processor  that  wants  to  talk  tc  a 
memory  is  simply  connected  to  that  memory  bus.  The  bus  couplers  are  an  inter¬ 
esting  alternative  to  the  centralized  crosspiint  switch  of  C.mmp.  While  the 
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bus  couplers  provide  a  very  modular  switching  scheme,  they  achieve  this 
modularity  through  a  proliferation  of  cables.  Programs  for  the  HSM  IMP  are 
written  so  that  the  most  frequently  accessed  code  is  in  IocpI  memory  attached 
to  the  processor  bus,  and  less  frequently  accessed  code  and  operands  are  in 
common,  shared  memory  along  with  all  i/o  buffers.  The  use  of  local  and  shared 
memory  in  the  HSM  IMP  is  in  contrast  with  the  homogeneous  shared  memory  in 
C.mrnp:  the  HSM  IMP  is  being  programmed  for  a  specific  task  -message  handling 

i: 

in  the  ARPA  network,  while  C.mmp  is  being  developed  as  a  general  purpose  com¬ 
putational  facility. 

An  interesting  innovation  in  the  HSM  IMP  is  the  pseudo  interrupt  device 
(PID).  The  PID  is  basic  to  the  sequencing  of  tasks  (or  control,  of  the  HSM 
IMP.  Any  processor  can  store  an  integer  in  the  PID,  and  when  the  PID  is  '’read” 
by  any  processor  it  returns,  and  then  deletes,  the  highest  integer  stored.  The 
processors  use  the  PDI  as  a  high  speed,  priority-ordered  queue  of  pending  tasks. 
The  PID  is  fundamentally  different  from  the  direct  processor-to-processor  inter¬ 
rupts  of  C.mmp. 

4.4.  Computer  Modules 

The  final  scheme  discussed,  termed  ’’computet  modules"  (Cl's),  is 
being  developed  at  Carnegie-Mellon  University  [Beil  et  al.,  1973;  Fuller 
and  Chen,  1973;  Fuller,  Siewiorek  and  Swan,  1973].  The  structure  of  a  typical 
Qi  network  is  shown  in  Figure  4.3. 

Basically,  CM’s  are  processor -memory  pairs  with  several  special  ports,  or 

bua  interfaces.  There  is  no  central,  shared  memory  in  the  sense  of  C.mmp  or 

HSM  IMP  (i.e.,  memory  modules  not  specifically  associated  with  any  processor). 

_  >  — 

While  the  HSM  IMP  is  being  developed  for  a  specific  task,  it  is  nonetheless 
believed  by  its  designers  to  be  applicable  to  a  wide  spectrum  of  tasks. 


S (memory  bus ; 1  Lockheed  Infibus) 
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Figure  4.3  The  General  Structure  of  a  Computer  Module  System 


Ihe  physical  address  space  in  a  CM  system  is  che  sum  of  the  local  memories  of 
the  CM’s  making  up  the  system  (Figure  4.3).  As  in  C.mmp  and  the  KSM  IMP,  each 
CM  processor  has  a  small,  virtual  address  space  (64K  bytes)  and  a  mapping  unit 
(in  this  case  the  bus  interlace)  that  translates  virtual  addresses  into  the 
large  physical  address  space.  The  bus  interfaces,  or  simply  p.  map’s,  provide 
access  to  inter-CM  buses.  A  D.rnap  monitors  the  intra-CM  bus  for  addresses  that 
are  within  segments  tagged  for  translation.  Upon  recognizing  such  an  address, 
the  D.rnap  maps  it  into  the  inter-Cl  bus  address  space.  Similarly,  D.maps  may 
also  monitor  the  inter-CM  bus  and  upon  recognizing  an  address,  map 

it  into  the  intra-CM  bus  address  space.  Thus  a  CM  can  request  an  address,  and 
if  the  mapping  registers  are  set  appropriately,  map  across  several  inter-CM 
buses  (and  through  several  Ol's)  before  reaching  the  desiiea  word  of  physical 
memory.  Whereas  computer  networks  teed  cooperation  from  remote  processors  to 
send  a  message,  e  processor  in  a  TM  can  access  a  remote  CM's  memory  without 
the  cooperation  of  the  remote  processor. 

5,  SUM!iARY  OF  MAJOR  OBSERVATIONS 

The  following  observations  are  an  attempt  to  state  the  major  con¬ 
clusions  of  the  discussion  at  'he  workshop.  These  are  net  meant  to  be  a 
comprehensive  set  of  comments  on  RX-level  modules,  semiconductor  technology, 
or  IMS-level  modules,  but  only  these  observations  that  were  va/jue  or  contro¬ 
versial  enough  to  warrant  discussion  at  the  workshop. 

5.1.  RT-Level  Modules 

1.  Semiconductor  manufacturers  currently  provide  an  adequate,  and  growing, 
set  of  RT-level  (i.e.,  MSI)  components  to  handle  the  standard  data 
operations  such  as  storage,  addition,  shifting,  etc.  However,  there 
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is  a  perplexing  lack  of  RT-level  control  components  to  handle  con¬ 
trol  operations.  This  cannot  be  excused  for  lack  of  under s tanding 
of  RT-level  control  components.  Bell  et  al.  [1972],  Clark  et  al. 
[1967],  Dennis  and  Patil  [1972],  and  Robinson  ri973],  all  leva 
demonstrated  workable  sets  of  control  modules. 

2.  The  "overhead"  in  hardware  required  to  transform  a  unit  of  logic 
into  a  module  that  observes  a  practical  inter-module  protocol  is 
commonly  on  the  same  order  of  cost  and  complexity  as  the  original 
logic.  In  many  cases  this  is  a  small  price  to  pay  for  the  drastic 
reduction  in  design  time.  In  any  event,  this  factor  should  be  kept 
in  mind  as  future  sets  of  modules,  and  future  applications  of  modular 
systems,  are  considered. 

5.2 .  Semiconductor  Technology 

1.  The  complexity  of  practical  semiconductor  components  is  doubling 
every  one  to  two  years.  The  industry's  current  limits  in  MOS 
manufacturing  ability  are  chips  that  contain  4K  bit  random  access 
memories  8  bits/word  microprocessors. 

2.  Random  logic  components  (e.g..  microprocessors)  have  consistently 
followed  regular  logic  components  (e.g.,  memories)  by  a  factor  of 
four  in  complexity.  Cne  consequence  of  this  is  that  a  4  or  8  bit 
microprocessor  is  roughly  equivalent  to  500  4  or  8  bit  words  of 
random  access  memory,  respectively. 

3.  A  semiconductor  chip  that  has  the  potential  sales  volume  of  the 
current  minicomputer  market,  i.e.,  about  30,000  units/year,  would 
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nut  be  economically  feasible  to  produce.  The  major  consequence  of 
this  is  that  microprocessors  in  the  foreseeable  future  will  be  de¬ 
signed  for  such  mass  markets  es  {.er'onal  calculators  and  intelligent 
terminals. 

5.3.  PMS- level  Modules 

An  observation  from  current  developments  in  the  semiconductor  industry 
is  that  small  microprocessors  are  the  most  obvious  LSI  control  module.  The 
following  comments  concern  the  problems  of  building  computer  structure*  with 
microprocessors,  and  other  LSI  components,  e.g.,  random  access  memories  and 
read  only  memories. 

1.  There  have  been  significant  efforts  in  the  past  to  decompose  algor¬ 
ithms  into  parallel  processes.  We  know  how  to  parallelize  at  a  small 
grain  (arithmetic  expressions  in  the  360/91  at  the  instruction  level) 
and  a  large  grain  (tasks  in  a  multiprogramming  system  at  the  several 
100's  to  1000' s  instruction  level).  At  the  intermediate  level  of 
problem  granularity  there  has  been  little  progress  made  with  a  general 
solution.  However,  a  number  of  specific  and  important  applications 
have  been  studied  and  are  known  to  decompose  efficiently  into  parallel 
tasks,  e.g.,  weather  simulation,  signal  processing,  airline  reserva¬ 
tion  systems,  message  switching,  and  many  vect  \nd  string  processes. 
Since  a  number  of  the  applications  that  can  be  decomposed  into  parallel 
processes  are  sufficiently  important,  they  justify  work  in  multiple 
processor  systems  and  encourage  work  in  the  development  of  parallel 
algorithms  for  other  applications. 
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2.  Multiple  microprocessor  systems  should  have  some  form  o*  local  memory 
and  attempt  to  exploit  any  locality  present  in  jobs  to  minimize  the 
innerent  switching  delays  associated  with  multiple  processors  accessing 
a  central,  shared  memory.  In  special  purpose  tasks,  such  as  an  IMP, 

an  a  priori  analysis  of  the  code  can  identify  the  commonly  used  seg¬ 
ments  of  a  program;  in  a  general  purpose  application  some  automatic, 
dynamic  scheme  (such  as  the  C.mmp  cache  proposal)  must  be  used. 

3.  Computer  structures  will  often  require  a  physical  address  space  much 
larger  than  the  virtual  address  space  of  an  individual  microprocessor. 
Some  convenient,  high  performance  method  must  be  used  to  provide  a 
mapping;  from  the  small  microprocessor  address  space  to  the  larger 
physical  address  space. 

4.  Inter-(micro) processor  communication  is  one  of  the  least  understood 
issues  in  multiprocessor  systems.  Hopefully  experience  with  the  var¬ 
ious  intercommunication  schemes  in  C.mmp,  HSM  IMP,  CM’s,  and  other 
multiprocessor  structures  will  provide  a  basis  for  further  work  in 
this  area. 
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